Faster Adaptive Set Intersections for Text Searching
نویسندگان
چکیده
The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm. Topics. Evaluation of Algorithms for Realistic Environments, Implementation, Testing, Evaluation and Fine-tuning of Algorithms, Information Retrieval.
منابع مشابه
A New Chinese Text Compression Scheme Combining Dictionary Coding and Adaptive Alphabet-Character Grouping
In this paper, a new scheme is proposed for Chinese text compression. The factors, compression rate and decompression speed, are specially considered in order to help such applications as full-text searching. Actually, our scheme is based on the LZ77 scheme. The modifications made include alphabet-augmenting to obtain better compression rate, and adaptive-grouping to have faster processing spee...
متن کاملPrediction of RO Membrane Performances by Use of Adaptive Network-Based Fuzzy Interference Systems
This study aims to develop an Adaptive Network-based Fuzzy Inference System technique (ANFIS) and using the parameters of a complex mathematical model in the RO membrane performances. The ANFIS was constructed by using a subtractive clustering method to generate initial fuzzy inference systems. The model trained by 70% of the data set and then its validity is examined by remained 30% data set. ...
متن کاملA Family of Variable Step-Size Normalized Subband Adaptive Filter Algorithms Using Statistics of System Impulse Response
This paper presents a new variable step-size normalized subband adaptive filter (VSS-NSAF) algorithm. The proposed algorithm uses the prior knowledge of the system impulse response statistics and the optimal step-size vector is obtained by minimizing the mean-square deviation(MSD). In comparison with NSAF, the VSS-NSAF algorithm has faster convergence speed and lower MSD. To reduce the computa...
متن کاملTries for combined text and spatial data range search
We use tries to represent combined text and spatial data, and present a range search algorithm for reporting all 2-d points and rectangles from a set of size intersecting a query rectangle. Data and queries can include text. Our -d+ tries are evaluated experimentally (for up to 300,000) using uniform distributed random spatial data and randomly selected strings from a set of place names. For ra...
متن کاملEnhancing GNU grep
The UNIX grep utility searches the input files selecting lines matching one or more patterns. Searching for patterns in text is an important operation in a number of domains, including program comprehension and software maintenance, structured text databases, indexing file systems, and searching natural language texts. Such a wide range of uses inspired the development of variations of the orig...
متن کامل